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Abstract. With the emergence of XML as a standard for representing 
business data, new decision support applications are being developed. 
These XML data warehouses aim at supporting On- Line Analytical Pro- 
cessing (OLAP) operations that manipulate irregular XML data. To en- 
sure feasibility of these new tools, important performance issues must be 
addressed. Performance is customarily assessed with the help of bench- 
marks. However, decision support benchmarks do not currently support 
XML features. In this paper, we introduce the XML Warehouse Bench- 
mark (XWeB), which aims at filling this gap. XWeB derives from the 
relational decision support benchmark TPC-H. It is mainly composed 
of a test data warehouse that is based on a unified reference model for 
XML warehouses and that features XML-specific structures, and its as- 
sociate XQuery decision support workload. XWeB's usage is illustrated 
by experiments on several XML database management systems. 
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1 Introduction 

With the increasing volume of XML data available, and XML now being a stan- 
dard for representing complex business data [2J , XML data sources that are per- 
tinent for decision support are ever more numerous. However, XML data bear 
irregular structures (e.g., optional and/or diversely ordered elements, ragged 
hierarchies, etc.) that would be intricate to handle in a relational Database 
Management System (DBMS). Therefore, many efforts toward XML data ware- 
housing have been achieved [14117129) , as well as efforts for extending the XQuery 
language with On-Line Analytical Processing (OLAP) capabilities |9ll2l26j . 

XML-native DBMSs supporting XQuery should naturally form the basic stor- 
age component of XML warehouses. However, they currently present relatively 
poor performances when dealing with the large data volumes and complex an- 
alytical queries that are typical in data warehouses, and are thus challenged 
by relational, XML-compatible DBMSs. A tremendous amount of research is 
currently in progress to help them become a credible alternative, though. Since 
performance is a critical issue in this context, its assessment is primordial. 



Database performance is customarily evaluated experimentally with the help 
of benchmarks. However, existing decision support benchmarks |7I15I16|2^ do 
not support XML features, while XML benchmarks |4l5l20l2g] target transac- 
tional applications and are ill-suited to evaluate the performances of decision- 
oriented applications. Their database schemas do not bear the multidimensional 
structure that is typical in data warehouses (i.e., star schemas and derivatives 
bearing facts described by dimensions [H]); and their workloads do not feature 
typical, OLAP-like analytic queries. 

Therefore, we present in this paper the first (to the best of our knowledge) 
XML decision support benchmark. Our objective is to propose a test XML 
data warehouse and its associate XQuery decision support workload, for perfor- 
mance evaluation purposes. The XML Warehouse Benchmark (XWeB) is based 
on a unified reference model for XML data warehouses [13] . An early version of 
XWeB [13] was derived from the standard relational decision support benchmark 
TPC-H ^5S]. In addition, XWeB's warehouse model has now been complemented 
with XML-specific irregular structures, and its workload has been both adapted 
in consequence and expanded. 

The remainder of this paper is organized as follows. In Section [21 we present 
and discuss related work regarding relational decision support and XML bench- 
marks. In Section [31 we recall the XML data warehouse model XWeB is based 
on. In Section m we provide the full specifications of XWeB. In Section [S] we 
illustrate our benchmark's usage by experimenting on several XML DBMSs. We 
finally conclude this paper and provide future research directions in Section |6j 

2 Related Work 

2.1 Relational Decision Support Benchmarks 

The OLAP APB-1 benchmark has been very popular in the late nineties [IS] . 
Issued by the OLAP Council, a now inactive organization founded by four OLAP 
solution vendors, APB-l's data warehouse schema is structured around Sale facts 
and four dimensions: Customer, Product, Channel and Time. Its workload of ten 
queries aims at sale forecasting. Although APB-1 is simple to understand and 
use, it proves limited, since it is not "differentiated to reflect the hurdles that 
are specific to different industries and functions" [^ . 

Henceforth, the Transaction Processing Performance Council (TPC) defines 
standard benchmarks and publishes objective and verifiable performance evalu- 
ations to the industry. The TPC currently supports one decision support bench- 
mark: TPC-H [25]. TPC-H's database is a classical product- order- supplier model. 
Its workload is constituted of twenty-two SQL-92, parameterized, decision sup- 
port queries and two refreshing functions that insert tuples into and delete tuples 
from the database, respectively. Query parameters are randomly instantiated fol- 
lowing a uniform law. Three primary metrics are used in TPC-H. They describe 
performance in terms of power, throughput, and a combination of these two cri- 
teria. Power and throughput are the geometric and arithmetic mean values of 
database size divided by workload execution time, respectively. 



Although decision-oriented, TPC-H's database schema is not a typical star- 
like data warehouse schema. Moreover, its workload does not include any explicit 
OLAP query. The TPC-DS benchmark, which is currently in its latest stages of 
development, fixes this up [24^. TPC-DS' schema represents the decision sup- 
port functions of a retailer under the form of a constellation schema with several 
fact tables and shared dimensions. TPC-DS' workload is constituted of four 
classes of queries: reporting queries, ad-hoc decision support queries, interactive 
OLAP queries, and extraction queries. SQL-99 query templates help randomly 
generate a set of about five hundred queries, following non-uniform distribu- 
tions. The warehouse maintenance process includes a full Extract, Transform 
and Load (ETL) phase, and handles dimensions with respect to their nature 
(non-static dimensions scale up while static dimensions arc updated). One pri- 
mary throughput metric is proposed in TPC-DS to take both query execution 
and the maintenance phase into account. 

More recently, the Star Schema Benchmark (SSB) has been proposed as a 
simpler alternative to TPC-DS [16j. As our early version of XWeB ^13], it is 
based on TPC-H's database remodeled as a star schema. It is basically archi- 
tectured around an order fact table merged from two TPC-H tables. But more 
interestingly, SSB features a query workload that provides both functional and 
selectivity coverages. 

As in aU TPC benchmarks, scaling in TPC-H, TPC-DS and SSB is achieved 
through a scale factor SF that helps define database size (from 1 GB to 100 TB). 
Both database schema and workload are fixed. The number of generated queries 
in TPC-DS also directly depends on SF. TPC standard benchmarks aim at 
comparing the performances of different systems in the same experimental con- 
ditions, and are intentionally not very tunable. By contrast, the Data Warehouse 
Engineering Benchmark (DWEB) helps generate various ad-hoc synthetic data 
warehouses (modeled as star, snowflake, or constellation schemas) and workloads 
that include typical OLAP queries [7]. DWEB targets data warehouse design- 
ers and allows testing the effect of design choices or optimization techniques in 
various experimental conditions. Thus, it may be viewed more like a benchmark 
generator than an actual, single benchmark. DWEB's main drawback is that its 
complete set of parameters makes it somewhat difficult to master. 

Finally, to be complete, TPC-H and TPC-DS have recently be judged insuf- 
ficient for ETL purposes |2T] and specific benchmarks for ETL workflows are 
announced [21127) . 

2.2 XML Benchmarks 

XML benchmarks may be subdivided into two families. On one hand, micro- 
benchmarks, such as the Michigan Benchmark (so-named in reference to the 
relational Wisconsin Benchmark developed in the eighties) [TH] and MemBeR [T], 
help XML documents storage solution designers isolate critical issues to optimize. 
More precisely, micro-benchmarks aim at assessing the individual performances 
of basic operations such as projection, selection, join and aggregation. These low- 



level benchmarks are obviously too specialized for decision support application 
evaluation, which requires testing complex queries at a more global level. 

On the other hand, application benchmarks help users compare the global 
performances of XML-native or compatible DBMSs, and more particularly of 
their query processor. For instance, X-Machl [3], XMark [20], X007 (an exten- 
sion of the object-oriented benchmark 007) ^ and XBench [35] are application 
benchmarks. Each implements a mixed XML database that is both data-oriented 
(structured data) and document-oriented (in general, random texts built from a 
dictionary). However, except for XBench that proposes a true mixed database, 
their orientation is either more particularly focused on data (XMark, X007) or 
documents (X-Machl). 

These benchmarks also differ in: the fixed or flexible nature of the XML 
schema (one or several Document Type Definitions - DTDs - or XML Schemas); 
the number of XML documents used to model the database at the physical level 
(one or several) ; the inclusion or not of update operations in the workload. We 
can also underline that only XBench helps evaluate all the functionalities of- 
fered by the XQuery language. Unfortunately, none of these benchmarks exhibit 
any decision support feature. This is why relational benchmarks presented in 
Section \TJ\ are more useful to us in a first step. 

3 Reference XML Warehouse Model 

Existing XML data warehouse architectures more or less converge toward a uni- 
fied model. They mostly differ in the way dimensions are handled and the number 
of XML documents that are used to store facts and dimensions. Searching for 
the best compromise in terms of query performance and modeling power, we 
proposed a unified model [14] that we reuse in XWeB. As XCube [10], our ref- 
erence XML warehouse is composed of three types of XML documents at the 
physical level: document dw-model.xml defines the multidimensional structure 
of the warehouse (metadata); each factsf.xml document stores information re- 
lated to set of facts / (several fact documents allow constellation schemas); each 
dimensioHd-xml document stores a given dimension d's member values for any 
hierarchical level. 

More precisely, dw-model.xmVs structure (Figure[T]) bears two types of nodes: 
dimension and FactDoc nodes. A dimension node defines one dimension, its 
possible hierarchical levels {Level elements) and attributes (including types), as 
well as the path to the corresponding dimension^.xml document. A FactDoc 
element defines a fact, i.e., its measures, references to the corresponding dimen- 
sions, and the path to the corresponding factsf.xml document. The factsf.xml 
documents' structure (Figure [2ja)) is composed of fact subelements that each 
instantiate a fact, i.e., measure values and dimension references. These identifier- 
based references support the fact-to-dimension relationships. 

Finally, the dimensioud-xml documents' structure (FigurelSfb)) is composed 
of Level nodes. Each of them defines a hierarchy level composed of instance 




Fig. 1. dw-model.xml graph structure 



nodes. An instance defines the member attributes of a liierarcliy level as well as 
their values. 



4 XWeB Specifications 
4.1 Principle 

XWeB derives from TPC-H, modified in a number of ways explained in the 
following sections, for three reasons. First, we acknowledge the importance of 
TPC benchmarks' standard status. Hence, our goal is to have XWeB inherit 
from TPC-H's wide acceptance and usage (whereas TPC-DS is still under devel- 
opment). Second, from our experience in designing the DWEB relational data 
warehouse benchmark, we learned that Gray's simplicity criterion for a good 
benchmark is primordial. This is again why we preferred TPC-H, which is 
much simpler than TPC-DS or DWEB. Third, from a sheer practical point of 
view, we also selected TPC-H to benefit from its data generator, dbgen, a feature 
that does not exist in TPC-DS yet. 

The main components in a benchmark are its database and workload models. 
XWeB's are described in Sections 14.21 and 14. 3[ respectively. In a first step, we 
do not propose to include ETL features in XWeB, although XQuery has been 
complemented with update queries recently . ETL is indeed a complex process 
that presumably requires dedicated benchmarks |21) . Moreover, the following 
specifications already provide a raw loading evaluation framework. The XWeB 
warehouse is indeed a set of XML documents that must be loaded into an XML 
DBMS, an operation that can be timed. 




Fig. 2. factsf.xml (a) and dimensioud-xml (b) graph structures 



4.2 Database Model 

Schema. At the conceptual level, like O'Neil et al. in SSB, we remodel TPC-H's 
database schema as an explicit multidimensional (snowflake) schema (Figure [3]), 
where Sale facts are described by the Part/Category, Customer/Nation/Region, 
Supplier/Nation/Region and Day /Month/ Year dimensions. 

The Part/Category hierarchy, which is not present in TPC-H, is of partic- 
ular interest. It is indeed both non-strict and non-covering [23 . Beyer et al. 
would term it ragged |2] • We prefer the term complex since ragged hierarchy has 
different meanings in the literature; e.g., Rizzi defines it as non-covering only 
|18j . More precisely, in our context, non-strictness means relationships between 
parts and categories, and between categories themselves, are many-to-many. 
Non-coveringness means parts and subcategories may roll up to categories at 
any higher granularity level, i.e., skipping one or more intermediary granularity 
levels. Complex hierarchies do exist in the real world, are easy to implement in 
XML, whereas they would be intricate to handle in a relational system [2]. 

At the logical level, the UML class diagram from Figure |3] translates into an 
instance of dw-model.xml (Figure SJ. Attributes (fact measures and dimension 
members) are not mentioned in Figure |4] for brevity, but they are present in the 
actual document. 

Finally, at the physical level, fact and dimension instances are stored in 
a set of XML documents, namely factsi.xml = f sale.xml, dimensioni = 
d-date.xml, dimension2 — djpart.xml, dimension^ = d-customer.xml and 
dimension^ = dsupplier.xml. To introduce further XML-specific features in 
XWeB, f .sale.xmVs DTD allows missing dimension references and measures, as 
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Fig. 3. XWeB warehouse's conceptual schema 



well as any order in fact subelements. Our aim here is to introduce a measure of 
"dirty data" in the benchmark. 

Parameterization. XWeB's main parameters basically help control data ware- 
house size. Size (S) depends on two parameters: the scale factor (SF) inherited 
from TPC-H, and density D. When D = 1, all possible combinations of dimen- 
sion references are present in the fact document (Cartesian product), which is 
very rare in real-life data warehouses. When D decreases, we progressively elim- 
inate some of these combinations. D actually helps control the overall size of 
facts independently from the size of dimensions. 

S can be estimated as follows: S = S dimensions + Sfacts, where Sdimensions is 
the size of dimensions, which does not change when SF is fixed, and S facts is the 
size of facts, which depends on D. Sdimensions = Sdec I'^lsF ^ nodesizeid) and 



<?xml version="l .0" encoding="UTF-8"?> 
<xweb-dw-model> 

<fact id="Sale" path="f _sale . xml"/> 
<dimension id="Date" path="d_date .xml"> 

<level id="Day" rollup="Moiith" drilldown=" "/> 
<level id="Month" rollup="Year " drilldown="Day"/> 
<level id="Year" rollup="" drilldown="Month"/> 
</dimension> 

<dimension id="PartDim" path="d_part . xml"/> 

<level id="Part" rollup="Category" drilldown=" "/> 

<level id="Category" rollup="Category" drilldown="Part Category"/> 
</dimension> 

<dimension id="CustomerDim" path="d_customer .xml"> 

<level id=" Customer" rollup="C_Nation" drilldown=" "/> 

<level id="C_Nation" rollup="C_Region" drilldown="Customer"/> 

<level id="C_Region" rollup="" drilldown="C_Nation"/> 

</dimension> 

<dimension id="SupplierDim" path="d_supplier .xml"> 

<level id="Supplier" rollup="S_Nation" drilldown=" "/> 

<level id="S_Nation" rollup="S_Region" drilldown="Supplier"/> 

<level id="S_Region" rollup="" drilldown="S_Nation"/> 

</dimension> 
</xweb-dw-model> 

Fig. 4. XWeB warehouse's logical schema 

Sfacts = rTdeX) I'^ils-F X _D X fact_size, where V is the set of dimensions, \d\sF 
the total size of dimension d (i.e., all hierarchy levels included) w.r.t. SF, \hf\sF 
the size of the coarsest hierarchy level in dimension d w.r.t. SF, nodesize{d) the 
average node size in dimensiond-xml, and fact^size the average fact element 
size. For example, when SF = 1 and D — 1, with node sizes all equal to 220 
bytes, the size of f_sale.xml is 2065 GB. Eventually, two additional parameters 
control the probability of missing values {Pm) and element reordering (Pq) in 
facts, respectively. 

Schema Instantiation. The schema instantiation process is achieved in two 
steps: first, we build dimension XML documents, and then the fact document. 
Dimension data are obtained from dbgen as flat files. Their size is tuned by SF. 
Dimension data are then matched to the dw-model.xml document, which contains 
dimension specifications, hierarchical levels and attribute names, to output the 
set ol dimensiond-xml {d G 2?) documents, djpart.xml, which features a complex 
hierarchy, is a particular case that we focus on. 

Algorithm from Figure [5] describes how categories are assigned to parts from 
djpart.xml. First, category names are taken from TPC-H and organized in three 
arbitrary levels in the cat table. Moreover, categories are interrelated through 
roUup and drill-down relationships to form a non-strict hierarchy. For example. 



level-2 category BRUSHED rolls up to level- 1 categories NICKEL and STEEL, 
and drills down to level-3 categories ECONOMY, STANDARD and SMALL. The 
whole hierarchy extension is available on-line (Section 15]). Then, to achieve non- 
coveringness, we assign to each part p several categories at any level, p.catset 
denotes the set of categories assigned to p. Each "root" category (numbering 
from 1 to 3) is selected from a random level Ivl. Then, subcategories may be 
(randomly) selected from subsequent levels. Non-coveringness is achieved when 
initial level is lower than 3 and there is no subcategory, neat and nsubcat refer 
to category and subcategory numbers, respectively, cand denotes a candidate 
category or subcategory. |cat[i]| is the number of elements in table cai's i*^ 
level. 



cat := [[BRASS, COPPER, NICKEL, STEEL, TIN], // level 1 

[ANODIZED, BRUSHED, BURNISHED, PLATED, POLISHED], // level 2 
[ECONOMY, LARGE, MEDIUM, PROMO, SMALL, STANDARD]] // level 3 
FOR ALL p IN d_part DO 
p.catset := EMPTY_SET 
neat := RAND0M(1, 3) 
FOR i := 1 TO neat DO 
Ivl := RANDOM (1, 3) 
REPEAT 

cand := cat [Ivl, RAND0M(1, I cat [Ivl] I)] 
UNTIL cand NOT IN p.catset 
p.catset := p.catset UNION cand 
nsubcat := RANDOM (0, 3 - Ivl) 
FOR j := 1 TO nsubcat DO 

cand := cat [Ivl + j, RAND0M(1, I cat [Ivl + j]|)] 
IF cand NOT IN p.catset THEN 

p.catset := p.catset UNION cand 
END IF 
END FOR 
END FOR 
END FOR 



Fig. 5. Part category selection algorithm 



Facts are generated randomly with respect to the algorithm from Figure |6l 
The number of facts depends on D, and data dirtiness on Pm and Pq (Sec- 
tion 14. 2p . D, Pra and Pq are actually used as Bernouilli parameters, val is 
a transient table that stores dimension references and measure values, to al- 
low them to be nullified and/or reordered without altering loop index values. 
The SKEWED_RANDOM() function helps generate "hot" and "cold" values for mea- 
sures Quantity and Total Amount, which influences range queries. Finally, the 
SWITCH function randomly reorders a set of values. 



FOR ALL c IN d_customer DO 
FOR ALL p IN d_part DO 

FOR ALL s IN d_supplier DO 
FOR ALL d IN d_date DO 

IF RANDOM (0, 1) <= D THEN 

// Measure random generation 
Quantity := SKEWED_RANDDM(1 , 10000) 
TotalAmount := Quantity * p.p_retailprice 
// Missing values mEoiagement 
val[l] := c.c_custkey; val [2] := p.p_partkey 
val [3] := s.s_suppkey; val [4] := d.d_datekey 
val [5] := Quantity; val [6] := TotalAmount 
FOR i := 1 TO 6 DO 

IF RANDOMCO, 1) <= Pm THEN 

val[i] := NULL 
END IF 
END FOR 

// Dimension reordering 
IF RANDOMCO, 1) <= Po THEN 

SWITCH (val) 
END IF 

WRITE(val) // Append current fact into f_sale.xml 
END IF 
END FOR 
END FOR 
END FOR 
END FOR 

Fig. 6. Fact generation algoritlim 



4.3 Workload Model 

Workload Queries and Parameterization. The XQuery language [3] allows 
formulating decision support queries, unlike simpler languages such as XPath. 
Complex queries, including aggregation operations and join queries over multiple 
documents, can indeed be expressed with the FLWOR syntax. However, we are 
aware that some analytic queries are difficult to express and execute efficiently 
with XQuery, which does not include an explicit grouping construct compara- 
ble to the GROUP BY clause in SQL [5]. Moreover, though grouping queries are 
possible in XQuery, there are many issues with the results [2]. We nonetheless 
select XQuery for expressing XWeB's workload due to its standard status. Fur- 
thermore, introducing difficult queries in the workload aims to challenge XML 
DBMS query engines. 

Although we do take inspiration from TPC-H and SSB, our particular XML 
warehouse schema leads us to propose yet another query workload. It is cur- 
rently composed of twenty decision support queries labeled QOI to Q20 that 
basically are typical aggregation queries for decision support. Though we aim 



to provide the best functional and selectivity coverages with this workload, we 
lack experimental feedback, thus it is likely to evolve in the future. Workload 
specification is provided in Table [TJ Queries are presented in natural language 
for space constraints, but their complete XQuery formulation is available on-line 
(Section El). 

XWeB's workload is roughly structured in increasing order of query com- 
plexity, starting with simple aggregation, then introducing join operations, then 
OLAP-like queries such as near-cube (with superaggregates missing) calcula- 
tion, drill-downs (e.g., Q06 drills from Q05's Month down to Day granularity) 
and rollups (e.g., Q09 rolls from Q08's Customer up to Nation granularity), 
while increasing the number of dimensions involved. The last queries exploit the 
Part/ Category complex hierarchy. We also vary the type of restrictions (by- value 
and range queries), the aggregation function used, and the ordering applied to 
queries. Ordering labeled by ~^ indicates a descending order (default being as- 
cending). Finally, note that Q20, though apparently identical to Q19, is a further 
roUup along the Category complex hierarchy. Actually, Q19 rolls up from Q18's 
product level to the category level, and then Q20 rolls up to the "supercategory" 
level, with supercategories being categories themselves. 

Moreover, workload queries are subdivided into five categories: simple re- 
porting (i.e., non-grouping) queries; 1, 2, and 3-dimension cubes; and complex 
hierarchy cubes. We indeed notice in our experiments (Section [5]) that com- 
plex queries are diversely handled by XML DBMSs: some systems have very 
long response times, and even cannot answer. Subdividing the workload into 
blocks allows us to adjust workload complexity, by introducing boolean execu- 
tion parameters {RE, ID, 2D, 3D and CH, respectively) that define whether a 
particular block of queries must be executed or not when running the benchmark 
(see below). 

Execution Protocol and Performance Metrics. Still with TPC-H as a 
model, we adapt its execution protocol along two axes. First, since XWeB does 
not currently feature update operations (Section |43}, the performance test can 
be simplified to executing the query workload. Second, as in DWEB, we allow 
warm runs to be performed several times (parameter NRUN) instead of just 
once, to allow averaging results and flattening the effects of any unexpected out- 
side event. Thus, the overall execution protocol may be summarized as follows: 

1. load test: load the XML warehouse into an XML DBMS; 

2. performance test: 

(a) cold run executed once (to fill in buffers), w.r.t. parameters RE, ID, 
2D, 3D and CH; 

(b) warm run executed NRUN times, still w.r.t. workload parameters. 

The only performance metric in XWeB is currently response time, as in SSB 
and DWEB. Load test, cold run and warm runs are timed separately. Global, av- 
erage, minimum and maximum execution times are also computed, as well stan- 
dard deviation. This kind of atomic approach for assessing performance allows 



Table 1. XWeB workload specification 
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to derive any more complex, composite metrics, such as TPC-H's throughput 
and power if necessary, while remaining simple. 

5 Sample Experiments 

To illustrate XWeB's usage, we compare in this section a sample of XML-native 
DBMSs, namely Base3<[l, eXisfl Sedn£0, X-Hiv£0 and xlndiceQ. We focus on 
XML-native systems in these experiments because they support the formulation 
of decision support XQueries that include join operations, which are much more 
difficult to achieve in XML-compatible relational DBMSs. In these systems, XML 
documents are indeed customarily stored in table rows, and XQueries are em- 
bedded in SQL statements that target one row/document, making joins between 
XML documents difficult to express and inefficient. 

Our experiments atomize the execution protocol from Section 14.31 on one 
hand to separately outline how its steps perform individually and, on the other 
hand, to highlight performance differences among the studied systems. Moreover, 
we vary data warehouse size (expressed in number of facts) in these experiments, 
to show how the studied systems scale up. Table [2] provides the correspondence 
between the number of facts, parameters SF and D, and warehouse size in kilo- 
bytes. Note that warehouse sizes are small because most of the studied systems 
do not scale up on the hardware configuration we use (a Pentium 2 GHz PC 
with 1 GB of main memory and an IDE hard drive running under Windows XP). 
The possibility of missing values and element reordering is also disregarded in 
these preliminary experiments, i.e., Pm = Pq = 0. 

5.1 Load Test 

Figure [7] represents loading time with respect to data warehouse size. We can 
cluster the studied systems in three classes. BaseX and Sedna feature the best 
loading times. BaseX is indeed specially designed for full-text storage and al- 
lows compact and high-performance database storage, while Sedna divides well- 
formed XML documents into parts of any convenient size before loading them 
into a database using specific statements of the Data Manipulation Language. 
Both these systems load data about twice faster than X-Hive and xindice, which 
implement specific numbering schemes that optimize data access, but require 
more computation at storage time, especially when XML documents are bulky. 
Finally, eXist performs about twice worse than X-Hive and xindice because, in 
addition to the computation of a numbering scheme, it builds document, element 
and attribute indexes at load time. 

^ http://www.inf.uni-konstanz.de/dbis/basex/ 

■* http://exist.sourceforge.net 

^ http://www.modis.ispras.ru/sedna/ 

® http://www.emc.com/domains/x-hive/ 

^ http://xml.apache.org/xindice/ 



Table 2. Total size of XML documents 
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5.2 Performance Test 

In this set of experiments, we measure query execution time with respect to 
data warehouse size. Since we rapidly test the hmits of the studied systems, we 
only and separately evaluate the response of reporting, 1-dimension cube, and 
complex hierarchy-based queries, respectively. In terms of workload parameters, 
RE ^ ID = CH = TRUE and 2D = 3D = FALSE. Moreover, we stop time 
measurement when workload execution time exceeds three hours. Finally, since 
we perform atomic performance tests, they are only cold runs (i.e., NRUN = 0). 

Figure |S] represents the execution time of reporting queries (RE) with respect 
to warehouse size. Results clearly show that X-Hive's claimed scalability capa- 
bility is effective, while the performance of other systems degrades sharply when 
warehouse size increases. We think this is due to X-Hive's specifically designed 
XProc query Engine (a pipeline engine), while Sedna and BaseX are specially 
designed for full-text search and do not implement efficient query engines for 
structural query processing. Finally, eXist and xindice are specifically adapted 
to simple XPath queries processed on a single document and apparently do not 
suit complex querying needs. 

In Figure m we plot the execution time of ID cube queries (ID) with respect 
to warehouse size. We could only test Sedna and X-Hive here, the other systems 
being unable to execute this workload in a reasonable time (less than three 
hours). X-Hive appears the most robust system in this context. This is actually 
why we do not push toward the 2D and 3D performance tests. Only X-Hive 
is able to execute these queries. With other systems, execution time already 
exceeds three hours for one single query. The combination of join and grouping 
operations (which induce further joins in XQuery) that are typical in decision 
support queries should thus be the subject of dire optimizations. 

Finally, Figure [TU] features the execution time of complex hierarchy-based 
queries (CH) with respect to warehouse size. In this test, we obtained results 
only with X-Hive, Sedna and BaseX. Again, X-Hive seems the only XML-native 
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Fig. 7. Load test results 



DBMS to be able to scale up with respect to warehouse size when multiple join 
operations must be performed. 



6 Conclusion and Perspectives 



When designing XWeB, which is to the best of our knowledge the first XML 
decision support benchmark, we aimed at meeting the four key criteria that 
make a "good" benchmark according to Jim Gray [8]. Relevance means the 
benchmark must answer various engineering needs. This is why we chose to base 
our work on a TPC standard. We also introduced more tunability, both at schema 
and workload levels, to adapt to the reality of XML DBMSs. Portability means 
the benchmark must be easy to implement on different systems. To this aim, we 
implemented XWeB with the Java language that allows connecting to most XML 
DBMSs through APIs (we used the very popular XML:DI^. Scalability means 
it must be possible to benchmark small and large databases, and to scale up 
the benchmark, which is achieved by inheriting from the SF parameter. Further 
tuning is achieved through the density (D) parameter. Eventually, simplicity 
means that the benchmark must be understandable, otherwise it will not be 
credible nor used. This is why we elected to base XWeB on TPC-H rather than 
TPC-DS or DWEB. 

In this paper, we also illustrated XWeB's relevance through several exper- 
iments aimed at comparing the performance of five native-XML DBMSs. Al- 
though basic and more focused on demonstrating XWeB's features than com- 
paring the studied systems in depth, they highlight X-Hive as the most scalable 



http: / / xmldb-org.sourceforge.net / xapi / 




Fig. 8. RE performance test results 



system, while full-text systems such as BaseX seem to feature the best data stor- 
age mechanisms. Due to equipment limitations, we remain at small scale factors, 
but we believe our approach can be easily followed for larger scale factors. We 
also show the kind of decision support queries that require urgent optimization: 
namely, cubing queries that perform join and grouping operations on a fact doc- 
ument and dimension documents. In this respect, XWeB had previously been 
successfully used to experimentally validate indexing and view materialization 
strategies for XML data warehouses [13]. 

Eventually, a raw, preliminary version of XWeB (warehouse, workload, Java 
interface and source code) is freely available onlineo as an Eclipse^ project. A 
more streamlined version is in the pipe and will be distributed under Creative 
Commons licencj^. 

After having designed a benchmark modeling business data (which XWeB 
aims to be), it would be very interesting in future research to also take into 
account the invaluable business information that is stored into unstructured 
documents. Hence, including features from, e.g., XBench into XWeB would help 
improve a decision support benchmark's XML specificity. 

Since the XQuery Update Facility has been issued as a candidate recommen- 
dation by the W3C [6] and is now implemented in many XML DBMSs (e.g., 
eXist, BaseX, xDB, DB2/PureXML, Oracle Berkeley DB XML...), it wiU also 
be important to include update operations in our workload. The objective is not 

® http://ena-dc.univ-lyon2.fr/download/xweb.zip 

http://www.eclipse.org 
^ ^ http : / /creativecommons.org/licenses /by-nc-sa/2 . 5 / 




Fig. 9. ID performance test results 



necessarily to feature full ETL testing capability, which would presumably neces- 
sitate a dedicated benchmark (Section 14. ip . but to improve workload relevance 
with refreshing operations that are casual in data warehouses, in order to chal- 
lenge system response and management of redundant performance optimization 
structures such as indexes and materialized views. 

The core XWeB workload (i.e., read accesses) shall also be given attention. 
It has indeed been primarily designed to test scaling up. Filter factor analysis of 
queries |16) and experimental feedback should help tune it and broaden its scope 
and representativity. Moreover, we mainly focus on cube-like aggregation queries 
in this version. Working on the output cubes from these queries might also be 
interesting, i.e., by applying other usual XOLAP operators such as slice & dice 
or rotate that are easy to achieve in XQuery [9]. 

Finally, other performance metrics should complement response time. Beyond 
composite metrics such as TPC benchmarks', we should not only test system 
response, but also the quality of results. As we underlined in Section r4.3[ complex 
grouping XQueries may return false answers. Hence, query result correctness 
or overall correctness rate could be qualitative metrics. Since several XQuery 
extension proposals do already support grouping queries and OLAP operators 
|2I9I12I26] . we definitely should be able to test systems in this regard. 
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